Accumulated kullback divergence for analysis of ASR performance in the presence of noise
نویسندگان
چکیده
In this paper, the accumulated Kullback divergence (AKD) is used to analyze ASR performance deterioration due to the presence of background noise. The AKD represents a distance between the feature value distribution observed during training and the distri bution of the observations in the noisy test condition for each in dividual feature vector component. In our experiments the AKD summed over all feature vector components shows a high correla tion with word error rate and AKD computed per component can be used to pinpoint those feature vector components that substan tially contribute to recognition errors. It is argued that the dis tance measure could be a useful evaluation tool for analyzing the strengths and weaknesses of existing noise robustness approaches and might help to suggest research strategies that focus on those elements of the acoustic feature vector that are most severely af fected by the noise. 1. IN TRO D U CTIO N Usually ASR engines are trained with speech that has been ac quired in a relatively quiet environment. Thus, the statistics of the individual components of the acoustic vectors mainly reflect variation that can be attributed to intraand inter-speaker differ ences in the speech sounds. In the presence of background noise, some or all of the acoustic vector components will show statistics that differ from those on which the ASR engine was trained. As a consequence, ‘noisy’ acoustic vectors associated with a given speech unit may be differently distributed compared to the proba bility density function (pdf) that describes the clean data for that unit in model space. Such differences will likely increase word error rate (WER). At least two ways exist to make ASR more noise robust: (1) Find a feature representation that is inherently noise robust, i.e., insensitive to background noise in the sense that the observed fea ture values are hardly affected by the presence of noise, and (2) apply noise reduction, i.e., estimate disturbances caused by the background noise and compensate for these disturbances. The effectiveness of a given noise robustness approach is con ventionally evaluated by monitoring WER. However, WER is a crude measure, that does not disclose the mechanisms underly ing some improvement (or the causes of a failure to find improveThis work was partially supported by the European project Speech Driven Multi-modal Automatic Directory Assistance (SMADA). The SMADA project is partially funded by the European Commission, under the Action Line Human Language Technology in the 5th Framework IST Programme. ment). A tool that does provide more direct access to the underly ing mechanisms is therefore needed. For the case of inherently noise robust features, such a tool should provide a metric with which the change in observation distributions in acoustic feature space due to the noise can be quantified (and subsequently be min imized). The same holds for noise reduction techniques: if we had a tool to measure the distribution differences between clean data and noise reduced data, it would be easier to design the ‘ideal’ noise reduction technique. From the literature on noise robust ASR, it is evident that the relation between WER and signal-to-noise ratio (SNR) is far from simple. At a given SNR, the error rate is strongly dependent on the type of noise and the type of acoustic features. Until now it has not been possible to predict which type of feature representation is most resistant to a particular type of noise. Here too, a tool that allows one to analyze the distance between clean training data and noisy test data would be a step towards a better understanding of the issue. In this paper, we present a measure based on the Kullback di vergence [1, 2] as a means to describe training-test mismatch. The measure describes the average distance between the statistical dis tributions of the test data and the distributions as observed on the set of train data. An important property of this measure is that it allows to quantify the relative contributions of individual compo nents of the acoustic vectors. As a result, it is possible to identify those vector components that contribute most to the distance mea sure. We therefore think that the distance measure may be a first step towards the desired tool referred to earlier. We want to illustrate the viability of this approach in the con text of a digit recognizer that has been trained on clean data and tested in (simulated) noisy conditions. To that aim, we investi gated the distance measure in combination with the changes in WER when training-test mismatch is selectively and artificially removed from those vector components that appear to have the largest relative distance contributions. This allowed us to study whether repairing components with a large contribution to the dis tance substantially increases recognition performance, and to con firm whether the measure has indeed the intended diagnostic prop erties. 2. ACCUM ULATED KULLBACK D IV ERG EN CE The Kullback divergence is a well-known measure for the distance between two statistical distributions [1, 2]. If we denote the obser vation distributions for the train and test condition as d trn and d tst, respectively, the Kullback divergence K for quantifying training-
منابع مشابه
Accumulated Kullback Divergence Fo in the Presence
In this paper, the accumulated Kullback divergence (AKD) is used to analyze ASR performance deterioration due to the presence of background noise. The AKD represents a distance between the feature value distribution observed during training and the distribution of the observations in the noisy test condition for each individual feature vector component. In our experiments the AKD summed over al...
متن کاملA novel framework for noise robust ASR using cochlear implant-like spectrally reduced speech
We propose a novel framework for noise robust automatic speech recognition (ASR) based on cochlear implant-like spectrally reduced speech (SRS). Two experimental protocols (EPs) are proposed in order to clarify the advantage of using SRS for noise robust ASR. These two EPs assess the SRS in both the training and testing environments. Speech enhancement was used in one of two EPs to improve the ...
متن کاملModel Confidence Set Based on Kullback-Leibler Divergence Distance
Consider the problem of estimating true density, h(.) based upon a random sample X1,…, Xn. In general, h(.)is approximated using an appropriate in some sense, see below) model fƟ(x). This article using Vuong's (1989) test along with a collection of k(> 2) non-nested models constructs a set of appropriate models, say model confidence set, for unknown model h(.).Application of such confide...
متن کاملTraining data selection for acoustic modeling via submodular optimization of joint kullback-leibler divergence
This paper provides a novel training data selection method to construct acoustic models for automatic speech recognition (ASR). Various training data sets have been developed for acoustic modeling. Each training set was created for a specific ASR application such that acoustic characteristics in the set, e.g. speakers, noise and recording devices, match those in the application. A mixture of su...
متن کاملA Comprehensive Performance Analysis of Direct Detection Receivers inWDMASystems
In this work the performance of a wavelength division multiple access (WDMA) system with direct detection receiver is investigated. For this purpose, the probability of error in a WDMA network with OOK modulation considering crosstalk, ISI, photo detector noise and thermal noise is calculated and the effect of each on system performance is investigated. The system performance in presence of PIN...
متن کامل